Blueprints for Text Analytics Using Python: Machine Learning-Based Solutions for Common Real World (Nlp) Applications

Blueprints for Text Analytics Using Python: Machine Learning-Based Solutions for Common Real World (Nlp) Applications

作者: Albrecht Jens Ramachandran Sidharth Winkler Christian
出版社: O'Reilly
出版在: 2020-12-29
ISBN-13: 9781492074083
ISBN-10: 149207408X
裝訂格式: Quality Paper - also called trade paper
總頁數: 424 頁





內容描述


Turning text into valuable information is essential for many businesses looking to gain a competitive advantage. There have been many improvements in natural language processing and users have a lot of options when choosing to work on a problem. However, it's not always clear which NLP tools or libraries would work for a business use--or which techniques you should use and in what order.This practical book provides theoretical background and real-world case studies with detailed code examples to help developers and data scientists obtain insight from text online. Authors Jens Albrecht, Sidharth Ramachandran, and Christian Winkler use blueprints for text-related problems that apply state-of-the-art machine learning methods in Python.If you have a fundamental understanding of statistics and machine learning along with basic programming experience in Python, you're ready to get started. You'll learn how to: Crawl and clean then explore and visualize textual data in different formatsPreprocess and vectorize text for machine learningApply methods for classification, topic analysis, summarization, and knowledge extractionUse semantic word embeddings and deep learning approaches for complex problemsWork with Python NLP libraries like spaCy, NLTK, and Gensim in combination with scikit-learn, Pandas, and PyTorch


作者介紹


Jens Albrecht is a full-time professor for Computer Science Department at the Nuremberg Institute of Technology. His work focuses on data management and analytics with a focus on text. He holds a doctorates degree in computer science. Before he rejoined academia in 2012, he has been working for over a decade in the industry as consultant and data architect. He is author of several articles on Big Data management and analysis.Sidharth Ramachandran currently leads a team of data scientists at GfK helping to build data products for the consumer goods industry. He has over 10 years of experience in software engineering and data science across telecom, banking and marketing industries. Sidharth also co-founded WACAO, a smart personal assistant on Whatsapp which was also featured on Techcrunch. He holds an undergraduate engineering degree from IIT Roorkee and an MBA from IIM Kozhikode. Sidharth is passionate about solving real problems through technology and loves to hack through personal projects in his free time.Christian Winkler is a Data Scientist and Machine Learning Architect. He holds a PhD in theoretical physics and has been working in the field of large data volumes and artificial intelligence for 20 years, with particular focus on scalable systems and intelligent algorithms for mass text processing. He is founder of datanizing GmbH, speaker at conferences and author of Machine Learning / Text Analytics articles.




相關書籍

Game Programming with Code Angel: Learn How to Code in Python on Raspberry Pi or PC

作者 Cunningham Mark

2020-12-29

Data Science on the Google Cloud Platform: Implementing End-To-End Real-Time Data Pipelines: From Ingest to Machine Learning 2nd

作者 Lakshmanan Valliappa

2020-12-29

輕松學會TensorFlow 2.0人工智能深度學習應用開發

作者 黃士嘉 林邑撰

2020-12-29